SLIP

Research Note I

November 27, 2001

Obtaining Informational Transparency with Selective Attention

Dr. Paul S. Prueitt

President, OntologyStream Inc

November 27, 2001

Research Note I

November 27, 2001

One needs the zip file amSLIP.zip to follow this Research Note.

Overview

Over the past few weeks I have been concerned with now the structure of a cluster might be represented graphically. A number of issues are involved.

1) The completeness and consistency of the graphical representation of a category (see Figure 3)

2) The size of the category or the size of various report results. The size may make understanding the relationships difficult due to the amount of data

3) Results from other data mining techniques. There are other data mining techniques that can supplement the SLIP non-specific relationship

Visualization and cognitive aids (sense-making and decision-making aids) are essential to all three issues.

Future research notes will address the issues of size and the supplementation from other data mining.

Cylant IDS data is currently being studied as a supplement to the SLIP technology.

Software development issues:

1) Completing the SLIP Warehouse Browser so that all input files (paired.txt and Datawh.txt) that are required by the SLIP Technology Browser version 2.2.0 are made available.

2) Reviewing and developing the non-database processes using Referential Information Base (RIB) techniques.

This Research Note addresses the question of where we are in our project.

I am looking to complete an operational system based on SLIP technology and related technologies, in support of Information Assurance. The estimate for the completion of this system is 3 – 6 months. A proof of concept can be made based on the software that I have currently have finished.

On completeness and consistency

Much of my time this week has been spent refining the ability to generate reports, sort the display of the report, organize the information in the report, and visualize event properties. I have some preliminary work to show, and some design work. I am also making small corrections in the software code.

Figure 1: Version 2.2.0 with the help file

The current sort function sorts by column but the sort is based on ASCII order. As I complete the response object, the user will see additional response comments so that the user can understand what has happened when the command “sort n” is executed (where n is between 1 and the total number of columns.)

Figure 2: Sort function

The determinants of the current Report are under constrained. This is as it should be. Additional constraints are generally imposed by SQL clauses. However, the data itself is incomplete. Without some top down expectancy, then the predictive potential from the SLIP technology will NOT be available.

The current expectancy is problematic for both human experts and artificial intelligence. It is this problem that the SLIP Technology is addressing most directly.

The development of the category substructure

I am refining some of the background processes, particularly related to the development of event maps (see Figure 4). These processes are producing a predictive capability.

The current Report file has some records that is not relevant to the category since the current retrieval is only using the atoms, one at a time, and delivers to the report file each record in the Datawh.txt that matches the designated column value with the value of the atom.

In additional to the conceptual work that I do, and the theoretical work, and the software design and coding, there are some current technical issues. I work these issues continuously. For example, in the current Report file, some of the values do not have a “standing” in the category.

By this, one means that the theoretical issue of completeness and consistency has yet to produce algorithms that filter the current report. The theory is clear, but the algorithm is not yet completed. Once the filter is in place, then the software is just more interesting to the user. The theory never has to be understood by the user. But the software must reflect a grounded theory of knowledge and knowledge management.

Let us look at some detail. If a record is in the Report then it is in the Report because of a computed match to an atom value. The theory tells us that the relationship between two atoms is why the category exists. The theory also tells us that given any two atoms in the record, there should be one or more “b” values that have established that relationship. If this “b’ value is not relevant to the global event map, then the record should be eliminated from the Report. But how to do this?

The answer to this question is clearly made in the presentations that I have delivered on a global stratified taxonomy. The answer MUST BE in the development of a common language used by the human community AND must be related to formal constructs that reflect SLIP data aggregation and other data mining processes. Why can we be so sure of where the answers MUST come from? The theory tells us this, and given the experimental trial use of the software, the domain experts will support the same conclusions that I have derived from SLIP theory.

There must be a human element, based on a common taxonomy of terms, to our solution AND a technological solution based on abstractions. These abstractions are NECESSARY to encode the patterns of invariance that occur at the (1) bit stream level and (2) the summary results from Intrusion Detection Systems such as RealSecure or Cylant. The abstractions may present a barrier to using the technology, but this barrier can be overcome easily IF we use the Browsers to examine real data that the domain expert is interested in. The user will quickly adopt the necessary abstractions and start to use the event maps as a means to reason about and share knowledge about the types and classes of events that are occurring in the Internet, in real time.

A technical note should be made at this point. The use of different data sources is vital if the SLIP technology is to be flexible and general in nature. In Figure 3 we see the trace of the event record,

[1] 991024980 12014 0 s21794 d629 207.172.106.87 208.205.160.42 tcp

(Actually the records occurs three times.). Other records occur three time and we have to figure out why the records are retrieved more that once. Both the FoxPro code and the VB code generate exactly the same Report. This is likely a process design flaw or something about the Cylant data set.

The current focus on one RealSecure data set only attends to part of the available data sources for the SLIP Technology. The development team must have a richer data set and more contact with the domain experts.

Figure 3: The extra records in category C1

Returning to the technical detail we can see that much has been gained during our IRAD project. But significant work needs to be made to make the technical detail seamless.

Using the Cylant data set, we have an example of a record that should not be reported at all or should be reported as an “external fact”. Again, due to the well-developed theory, we know that each atom has a set of non-specific relationships that are determined by a common “b” value. The theory is well developed in formal theorems in my collaboration with several mathematicians and is well developed in algorithmic specification in my collaboration with computer scientists. Similar results can be created using the RealSecure data.

I concluded that if there is a non-specific relationship, to atoms NOT in a specific cluster, then this relationship must be seen as an “external” valance. I then evolved the software in the direction of showing two categories of relationships between atoms and atom compounds, the external relationship and the internal relationship. (see Figure 4). The concept of valence is well developed in my published work over a period of ten years.

The concept of valance suggests that a chemistry of event composition can be reflected in the event maps. As the event maps are shared with a secure community, one expects that the maps will facilate rapid communication within the community. The event maps also can be used to automate the construction and use of Petri Nets and rule bases.

The automated construction of Petri nets and rule sets has not been tested but I have explored privately with a PhD candidate at University of Idaho the process whereby the SLIP and Petri Net technology can be integrated. My discussions within my close community are always made with an agreement to professional confidentiality and with limited sharing of knowledge about my relationship to my clients. The purpose of these discussions is to know where the leading edge is in detection technology and vulnerability analysis.

Let us return to a discussion about event maps. Consider category C1, (available in amSLIP.zip.) The six atoms all have a chaining relationship; we know this because cluster C1 is prime (all atoms quickly move to the same location.) The chain is in fact represented as the set of ordered triples (called “syntagmatic units” in semiotic theory):

{ <d941, s48745, d900>, <d900, s48745, d790>,<d790, s48745, d780>,<d780, s48745, d629>,

<d629, s48745, d1418> }

A transitive relationship, a * b and b * c à a * c allows us to indicate 6 links (in red) as the above set or as the fully enumerated set with 24 links (in red and blue)

Figure 4: The event map for category C1

After [1] is removed and the duplicates are removed we have the following Report for category C1.

989615966 24007 0 s48745 d941 208.205.160.42 208.205.160.42 tcp

989615951 23988 0 s48745 d900 208.205.160.42 208.205.160.42 tcp

989615969 24011 0 s48745 d790 208.205.160.42 208.205.160.42 tcp

989615958 23996 0 s48745 d780 208.205.160.42 208.205.160.42 tcp

989615955 23992 0 s48745 d629 208.205.160.42 208.205.160.42 tcp

989615960 23999 0 s48745 d1418 208.205.160.42 208.205.160.42 tcp

The event map in Figure 4 is hand drawn from data that is now derived from my FoxPro programs and data structures. I have almost redesigned a data aggregation process so that these drawing is automatically produced in the SLIP Technology Browser and viewed as indicated in Figure 5.

Figure 5: Mock-up of how the event maps will be viewed using the Technology Browser

Clearly, the event maps will characterize global events and provide a common means to discussion these events. In Figure 5 I look ahead to how the event maps will be automatically generated and displayed using the SLIP Technology Browser. I am developing this work as rapidly as I possibly can.

A second event map

In category E2, we have the following set of syntagmatic units:

{ <d520, s1024, d3130>, <d3130, s1024, d161>, <d161, s1024, d520 > }

and also

{ <d3128, s2417, d568> }

One can see the two clusters by looking in the Members window.

There are three atoms that are at degree 306-307 and the two atoms that are at degree 156 (Figure 6). By manually looking into the report we find the source ports that binds the links together (Figure 7). Anyone that has the zip file (686 K) amSLIP.zip can click on category E2 and magnify to 200 by typing “mag 200” in the command line.

Figure 6: Clustered patterns for a small group of 10 atoms

We will always be able to draw an event map (like Figure 4 or Figure 7) for any category. Event map representation can be automated for even the full set of atoms in A1, thus giving a complete day’s data set, from any IDS, one stop visualization.

Figure 7: Event map for category E2

Because of event-map automation, the concept of an automatically produced SLIP Framework is even more interesting. The ending nodes of an automatically produced Framework will each have the character of Figure 4. Common and recognizable graphic patterns are easily recognized. Complete visual summarization for a day or week or even a month can be produced, saved and printed.

The event map (Figure 7) shows a link to port 80 from both the { 3139,161,520} cluster and the { 3128,568} cluster. By looking at Figure 8 one can follow how the category E2 was produced. The large cluster in Figure 8 contains two ports; 80 and 113, that together links the entire large cluster together (all 72 atoms).

I moved the large cluster into the category B1, and then removed just these two atoms manually so that the clustering process was fractured. We can now see the structure of links with d80 and d113 not in the category. Later after finding the two small primes (in Figure 7) I checked manually to find that in fact the two primes are related to d80. This is indicated as an external link (dotted line). This means that the two primes (prime at the E level) are both part of a larger prime at the B level.

The event map in Figure 6 shows a non-specific relationship that links together two small groups of defensive ports (because a common s_port is used to access both port groups during the time of the global event). This is Cylant data and I do not have any idea what the global event was.

Figure 8: Cluster pattern for the Cylant data set (amSLIP.zip)

In Figure 8 we have the top node of the Cylant data set. The Cylant data is taken from a behavioral study of the Linux kernel on a computer that is connected to the Internet as a type of HoneyNet.

Figure 9: The mock up for the SLIP Warehouse Browser

The screen in Figure 8 shows the analytic conjecture for the Cylant data. The analytic conjecture is that d_port are related by having a common s_port.

The drill down

One of our sources of data is a RealSecure event log from April 15th. I have around 14,000 RealSecure IDS records where each of these records is a signature produced in response to some event defined by the IDS as an intrusion event. I also have around 68,000 records from a Cylant IDS log file.

Any one of the SLIP prime categories will identify a small subset of these 14,000 records, AS WELL AS create an abstraction based on categorization of a number of similar RealSecure intrusion events. Each of these RealSecure intrusion events is a single event in the 14,000 records. The abstraction may be something like an event that involves port 80 and port 113, as seen in Figures 4 and 7.

This abstraction has various uses.

1) The abstraction is a query that can be used against the original data OR against new data.

2) The abstraction codifies a pattern of data that occurs more than once. The pattern involves more that one RealSecure event record, AND the pattern occurs in more than one more globally defined situation. Abstractions are easy to use once the context of the abstraction is made real in specific situations. For example, we use abstractions when we count; one, two, three, etc. Rendering the SLIP abstractions as event maps makes these abstracts very easy to see and use.

3) The abstraction is one of a small class of abstractions that when talked about between analysts results in a selective attention to real patterns of occurrence. The patterns and the abstractions can be used to understand events that are more global than what the event log data source is recording.

The drill down into a pre-defined event can occur in at least two ways:

1) The pre-defined event is something that has been identified by a CERT as something that occurred on April 15th 2001 (for example). The SLIP software is NOW developed to the point of being usable in constructing a set of abstractions (for example about RealSecure data patterns) that will illustrate the nature of this, or ANY, pre-defined event. We have not been spending the time that is required to do this because the client is busy with current events. Thus I have been developing the software so that analysts can use the tools in a secure environment on real time event analysis. So any new data source can NOW be used to develop an understanding of the patterns of data and the rendering of each of the patterns as an abstraction with a simply visualization as an event map. My examination of the Cylant data illustrates this generality.

2) Each pre-defined event (we have only one pre-defined event provided to me) will produce a small set of abstractions. Again these abstractions are patterns of data occurrence. Given a pattern there are more than one instance of the pattern, and thus the pattern itself IS an abstraction.

The event maps gives a simple visualization of these patterns.

Once one has a number of event maps then these event maps can be used to retrieve (using standard query language) all records that have the pattern of relationships in the event map composition. This retrieval capability is available for audit trail of any pre-defined event, as well as for prediction of the potential occurrence of a similar (similar to the pre-defined event) event from incomplete RealSecure data records (in real time).

The nature of atoms

In Table 1 we have the full specification of the six atoms in Figure 7. The data structure required to encode this data is an object. The atom object is now programmatically available as part of the SLIP Technology Browser.